Bagging and the Random Subspace Method for Redundant Feature Spaces
نویسندگان
چکیده
The performance of a single weak classifier can be improved by using combining techniques such as bagging, boosting and the random subspace method. When applying them to linear discriminant analysis, it appears that they are useful in different situations. Their performance is strongly affected by the choice of the base classifier and the training sample size. As well, their usefulness depends on the data distribution. In this paper, on the example of the pseudo Fisher linear classifier, we study the effect of the redundancy in the data feature set on the performance of the random subspace method and bagging.
منابع مشابه
Two credit scoring models based on dual strategy ensemble trees
Decision tree (DT) is one of the most popular classification algorithms in data mining and machine learning. However, the performance of DT based credit scoring model is often relatively poorer than other techniques. This is mainly due to two reasons: DT is easily affected by (1) the noise data and (2) the redundant attributes of data under the circumstance of credit scoring. In this study, we ...
متن کاملEnsemble classification based on generalized additive models
Generalized additive models (GAMs) are a generalization of generalized linear models (GLMs) and constitute a powerful technique which has successfully proven its ability to capture nonlinear relationships between explanatory variables and a response variable in many domains. In this paper, GAMs are proposed as base classifiers for ensemble learning. Three alternative ensemble strategies for bin...
متن کاملAttribute bagging: improving accuracy of classifier ensembles by using random feature subsets
We present attribute bagging (AB), a technique for improving the accuracy and stability of classi#er ensembles induced using random subsets of features. AB is a wrapper method that can be used with any learning algorithm. It establishes an appropriate attribute subset size and then randomly selects subsets of features, creating projections of the training set on which the ensemble classi#ers ar...
متن کاملWeighted random subspace method for high dimensional data classification.
High dimensional data, especially those emerging from genomics and proteomics studies, pose significant challenges to traditional classification algorithms because the performance of these algorithms may substantially deteriorate due to high dimensionality and existence of many noisy features in these data. To address these problems, pre-classification feature selection and aggregating algorith...
متن کاملA New Hybrid Framework for Filter based Feature Selection using Information Gain and Symmetric Uncertainty (TECHNICAL NOTE)
Feature selection is a pre-processing technique used for eliminating the irrelevant and redundant features which results in enhancing the performance of the classifiers. When a dataset contains more irrelevant and redundant features, it fails to increase the accuracy and also reduces the performance of the classifiers. To avoid them, this paper presents a new hybrid feature selection method usi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001